Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIME/TIMESTAMP W/O TIME ZONE semantics fix - continuation (v3) #10193

Merged
merged 26 commits into from
Jun 27, 2018

Conversation

findepi
Copy link
Contributor

@findepi findepi commented Mar 19, 2018

Part of #7122, supersedes #9385

@findepi
Copy link
Contributor Author

findepi commented Mar 19, 2018

Currently this is #9385 rebased on master + fix for #9385 (comment)

cc @haozhun . Before you review, we will yet retract hive changes from this.

@losipiuk losipiuk force-pushed the epic/timestampp/pr branch 4 times, most recently from 5166901 to 08cb1fc Compare March 20, 2018 10:23
@losipiuk
Copy link
Contributor

@haozhun Here is the prefix of https://github.com/starburstdata/presto/tree/epic/timestampp/3, which skips the Hive connector changes as we discussed. Feel free to review when you have a chance.

@haozhun
Copy link
Contributor

haozhun commented Apr 5, 2018

Given the current state of this work stream, every one agrees that it's better to merge existing work, and make changes later. As a result, my approach to reviewing this PR is going to be more lenient on the front of general designs. If I have a comment that I consider that is better addressed in a follow up, I'll point that out. And those should eventually go in #10326. I would like to emphasize that design choices that is merged from this work stream is subject to change in later commits.

@haozhun
Copy link
Contributor

haozhun commented Apr 5, 2018

FYI, Your commits are not correctly ordered.

Notes:

  • I'll be editing this comment.
  • If an item is not prefixed with TODO, I recommend fixing it before merging. But if it turns out to be annoying, we can make it a TODO. On the other hand, if you find a TODO easy to do it before merging, you can also just do it (but don't do it if doing so involves adding new commits).
  • When I say TODO, I don't know if a later commit addresses it. I'll edit as I go.
  • I assume political zone is going to be disallowed in TIME WITH TIME ZONE very soon. First, Reasoning about it is really painful. Secondly, I believe covering all the cases correctly is actually a lot of work, and this PR didn't sufficiently address it.

Commits:

  • "Add warning concerning legacy_timestamp session property" - looks good
  • "Enable tests for non-legacy timestamp semantics" - looks good
  • "Fix current_time timezone offset" - looks good with comments
    • Change commit message to include localtime
    • TODO: The test in its current state is not good. It's meaningless to a reader. Adding a for-test factory method to SqlTime and SqlTimeWithTimeZone that takes LocalTime/OffsetTime will help.
    • TODO: I would like to see see stack representation of Time (wo TZ) changed.
    • TODO: Add comment to explain the stack representation of these types. I think the class level javadoc for TimeType and TimeWithTimeZoneType is a good place. I don't know how to explain this with English unambiguously and concisely. I'll probably go with examples. In my opinion, a table will really help. For example, for TimeType, this table answers all my questions about the legacy representation. But if you think additional line will help, feel free to add them.
/* The table below describes what a stack value represents in legacy representation.
/* Note that the table below is correct regardless of session start time
/* (i.e. whether DST is in effect, whether offset changed for a zone).
/* +---------------------+----------------+------------+
/* |     session tz      |      value     | represents |
/* +---------------------+----------------+------------+
/* | America/Los_Angeles |  7 * 3_600_000 |  Invalid   |
/* | America/Los_Angeles |  8 * 3_600_000 |  00:00:00  |
/* | America/Los_Angeles | 23 * 3_600_000 |  15:00:00  |
/* | America/Los_Angeles | 31 * 3_600_000 |  23:00:00  |
/* | America/Los_Angeles | 32 * 3_600_000 |  Invalid   |
/* | Asia/Tokyo          | -9 * 3_600_000 |  00:00:00  |
/* | Asia/Tokyo          |  8 * 3_600_000 |  17:00:00  |
/* +---------------------+----------------+------------+
  • "Introduce new TIME/TIMESTAMP semantics to SQL types" - comments below
    • Do not deprecate getSessionTimeZoneKey and isLegacyTimestamp. I don't see a particular benefit in deprecating them. I would like to reserve deprecation for methods that should not be used. We will have some.
    • Rename millisUtc field to millis, rename millisUtc parameter in newly added constructors to millisLocal. Do this for both SqlTime/SqlTimestamp.
    • Deprecate getMillisUtc, make it throw if the object was constructed with new semantics. Remove its usage.
    • Add getMillis.
    • TODO: The readability of code (if we choose to restructure SqlTime/SqlTimestamp this way) depends heavily on strikethrough lines added by IDE. I propose moving to factory methods for construction. I understand that refactoring code right now is going to be painful. Therefore, I propose this as a (relatively immediate) follow up item. For backward reasons, you can keep the old constructor, but deprecate it and remove its usage. For example,
      • SqlTime.of(long millisLocal)
      • SqlTime.ofLegacy(long millisUtc, TimeZoneKey sessionTimeZoneKey
  • "Introduce new TIME/TIMESTAMP semantics SPI types" - looks good
  • "Add new TIME/TIMESTAMP parsing/printing to DateTimeUtils" - comments below
    • Do not deprecate "LEGACY_TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER"
    • You need a LEGACY_TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER. And you should use that one for the legacy parseTimestampWithoutTimeZone.
    • The new parseTimestampWithoutTimeZone should use TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER instead of TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER. Details in this comment
    • TODO: Rename TIME_FORMATTER to TIME_WITHOUT_TIME_ZONE_FORMATTER
    • TODO: Rename parseTime to parseTimeLiteral. The original name is a confusing one given TIME is a synonym of TIME WITHOUT TIME ZONE.
    • TODO: Add 'Legacy' suffix to method names in DateTimeUtils instead of using @Deprecated
  • "Add new TIME/TIMESTAMP semantic variant to Expression parsing" - looks good
  • "Use legacy-flag aware wrappers for SqlTime/SqlTimestamp cration in tests" - comments
    • Rename TestingSqlTime to DateTimeTestingUtils
    • I don't understand why you need timestampZone. It's available in the session.
    • I don't understand why you need baseZone. It can be derived from timestampZone.
    • Please don't have overloads that take millis. Remove unnecessary usage of from_unixtime in tests. (optionally in a separate commit)
    • Please don't have overloads that take DateTime. java.time.LocalDateTime will be a better choice.
    • Please don't have overloads that directly take year/month/.../millis. They are unnecessary if LocalDateTime is accepted. (I acknowledge the convenience here. But I don't think it's worth it.)
    • Given the above, for consistency, it's probably a better idea to have sqlTimeOf take LocalTime only.
    • TODO: I feel like this approach is a terrible idea because it makes the expected value not obvious. In general, it is important that tests it clear and obvious with minimum logic. But I'm not sure. I'll have to play with it. Don't worry about this for now.
  • "Fix current_time, localtime & localtimestamp semantics"
    • Please update setStartTime(new DateTime(2017, 3, 1, 10, 0, 0, 0, DateTimeZone.UTC).getMillis()) to use Kathmandu time zone.
    • Please remove unnecessary throws Exception in TestDateTimeFunctions and TestDateTimeFunctionsLegacy.
    • Please add setTimeZoneKey(TIME_ZONE_KEY) to TestDateTimeFunctions, just like you did to TestDateTimeFunctionsLegacy.
    • The montrealTimeZoneKey change doesn't seem to belong to this commit. In addition, I imagine there is a reason for the change. Can you put comment on it?
    • Can you avoid dealing with current_time here? I see that no changes were made to SqlTimeWithTimeZone in this PR. I also see that SqlTimeWithTimeZone.toString does not depend on the flag. As a result, I don't see how any changes to TIME WITH TIME ZONE that involves the flag can be correct. We should have a discussion on current_time. It's really hard to reason about TIME WITH TIME ZONE with political zone in the picture. It's up to you to pick one of the two alternatives below.
    • If you want this fix in, I would be fine that this going in without respecting the flag. I don't think the flag makes sense here.
    • If you don't care much (since TIME WITH TIME ZONE is going away soon), you can remove this change.
  • "Introduce new TIME/TIMESTAMP semantics to scalar functions"
    • This is not an acceptable approach: formatString.toStringUtf8().toLowerCase().contains("z"). Given the complexities, I'm ok with merging it first and then fix it later. Add a TODO along the lines of // TODO: Checking for the existence of z in format string is a major hack. A better approach is to fork DateTimeFormat.forPattern. This code should be fixed as soon as possible.
    • I prefer that testTimeZoneGapIsNotApplied and testTimeZoneGapIsNotApplied be renamed to testTimeZoneGap. This helps in that the corresponding tests can be easily discovered by some one reading the code. Having a comment at the beginning of the method that explains the intended direction of the test for both will be helpful.
    • Ditto for testDaylightTimeSavingSwitchCrossing.
  • "Add new date time semantics to date time cast operators"
    • The old modulo24Hour is still useful for TIME WITH TIME ZONE. It won't become legacy. TODO: Add 'Legacy' suffix to modulo24Hour instead of using @Deprecated.
      • Remove @Deprecated
      • Rename both overload of modulo24Hour because they're doing completely different things and shouldn't share a name.
    • REFERENCE_TIMESTAMP_UTC in TimeOperators is unused. Remove.
    • For castToTimeWithTimeZone, use convertLocalToUTC instead of doing the math directly. It's incorrect whenever you pass local epoch to getOffset. (I understand this means we are basically looking at the zone as if it's 1970-01-01. But I assume political zone for TIME WITH TIME ZONE is going away soon.)
    • For TimeWithTimeZoneOperators.castToTimestamp, let's not worry about REFERENCE_TIMESTAMP_UTC and use convertUtcToLocal? (I understand this means we are basically looking at the zone as if it's 1970-01-01. But I assume political zone for TIME WITH TIME ZONE is going away soon.)
    • For TimestampOperators.castToTimeWithTimeZone, why not use convertLocalToUTC instead of doing the math directly. It's incorrect whenever you pass local epoch to getOffset. (Side note: When we make TIME WITH TIME ZONE support offset zone only, this will be a tricky one. It will need to make sure it chooses the correct offset zone depending on the local epoch.)
    • For TimestampOperators.castToTimestampWithTimeZone, use convertLocalToUTC instead of doing the math directly. It's incorrect whenever you pass local epoch to getOffset.
    • Typo IllegalArgumentException ei
    • TimestampOperators.castFromSlice should accept values with time zone (regardless of flag).
      • Change // Despite the name this accepts literals with and without time zone to // This accepts value with or without time zone and move it to the beginning of the method.
      • Remove most of the catch block in else.
    • For TimestampWithTimeZoneOperators.castToTimestamp, use convertUTCToLocal.
    • TestDateTimeFunctionsBase.testDateDiffTimestamp is a big hack. To constrain the scope of the hack, revert changes to any of the fields. You can add variables to testDateDiffTimestamp wherever necessary. Add a TODO to fix it ASAP. I'll deal with the rest after this PR is merged.
    • Inline your newly introduced assertFunctionString overload in TestTimeBase.
    • Remove testCastToTimeWithTimeZoneWithTZWithRulesChanged since we don't care about political zones in TIME WITH TIME ZONE, and it's hard to think about.
    • Remove testCastToTimestampWithTimeZoneDSTIsNotAppliedWhenTimeCrossesDST for the same reason. (By the way, the method name should say Time instead of Timestamp.)
    • You moved a chunk of TestTimestampBase.testCastFromSlice to TestTimestamp and TestTimestampLegacy. Move it completely, or move none of it. Don't move partially. Don't call super. Do not use sqlTimestampOf (only do so in TestXBase).
    • Your changes to TestTimestampWithTimeZone. Add comments. Use a different method name.
  • "Fix at_timezone(TIME WITH TIME ZONE)" - looks good
    • Change to timestampAtTimeZone and getTimeZoneKeyForOffset is good.
    • I didn't take care in timeAtTimeZone. After all, it doesn't matter given political zone in TIME WITH TIME ZONE is going away soon.
  • "Introduce new TIMESTAMP semantics to to_iso8601 scalar" - looks good with comments
    • Call both test methods toIso8601ForTimestampWithoutTimeZone. You can add comments to the beginning of the methods.
  • "Always use fixed TIME as session base in TestDateTimeFunctions" - looks good
  • "Add tests covering Time types representation" - looks good
    • Move testTimeRepresentationOnDate and valueFromLiteral to the end in TestDateTimeOperators.java.
    • When arguments of a function is split on multiple lines, we generally put the first argument on its own line.
  • "Use Asia/Kathmandu as Cassandra timezone in product-tests" - looks good
  • "Fix timestamp related Cassandra product-tests" - looks good
  • "Fix timestamp related Teradata Function product-tests" - looks good
  • "Use non-JVM timezone in TestDateTimeFunctionsBase" - looks good

@findepi
Copy link
Contributor Author

findepi commented Jun 7, 2018

I rebased on current master without changing anything except resolving a few trivial conflicts.
All new code is fixups added at the end.

Below I am addressing the must-have bullets (and some TODOs as well).

Big thanks for your thorough review

Change commit message to include localtime

I added a squahs commit, so that I don't mess with the commit history yet.

I would like to see see stack representation of Time (wo TZ) changed.

I though this way about TIME WITH TIME ZONE (but ultimately decided not to, to avoid even more chaos).

What representation would you like to see for TIME?

Do not deprecate getSessionTimeZoneKey and isLegacyTimestamp. I don't see a particular benefit in deprecating them. I would like to reserve deprecation for methods that should not be used. We will have some.

In general, I agree we should deprecate methods, switches, flags just because a related functionality is going to be removed.
However, I think SqlTimestamp case is very special. Due to technical limitations (type system, client infrastructure code, etc.) we use single class to represent two quite different things (legacy TIMESTAMP and the new TIMESTAMP). Here, @Deprecated is really helpful to understand when a particular field is going to be set.

Rename millisUtc field to millis, rename millisUtc parameter in newly added constructors to millisLocal. Do this for both SqlTime/SqlTimestamp.

My initial reaction to using millisUtc was same as yours, millisUtc are a bit misleading. However, millisLocal are IMO misleading as well.

The stack representation of a timestamp can be interpreted as referring to UTC: "an instant of time (millis) which in UTC time zone corresponds to the timestamp value being represented". Also, the code actually leverages this fact, eg. com.facebook.presto.spi.type.SqlTimestamp#toString invokes UTC time zone.

Do not deprecate "LEGACY_TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER"

Done

You need a LEGACY_TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER. And you should use that one for the legacy parseTimestampWithoutTimeZone.

Please help me understand. It seemed as unnecessary, as the TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER was not changed.

The new parseTimestampWithoutTimeZone should use TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER instead of TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER. Details in this comment (#9385 (comment))

That comment was addressed more explicitly, in com.facebook.presto.type.TimestampOperators#castFromSlice.
Since parseTimestampWithoutTimeZone is a shared method and the comment pertained the CAST from varchar, I though it would be better to change the CAST code.

Rename TestingSqlTime to DateTimeTestingUtils

I added a TODO note, not to mess with commit history at this point.

I don't understand why you need timestampZone. It's available in the session.

It should be the same, but existing code didn't have provisions to ensure that.

I don't understand why you need baseZone. It can be derived from timestampZone.

They represent two different things. See com.facebook.presto.type.TestTimestampWithTimeZoneBase#testCastToTimestamp

Please don't have overloads that take millis. Remove unnecessary usage of from_unixtime in tests. (optionally in a separate commit)

I removed most of it.

Please don't have overloads that take DateTime. java.time.LocalDateTime will be a better choice.

The only reason for having this is to facilitate the transition.
I think removing them needs some code duplication, which generally is OK, but would prefer not to add this to this PR.
Also, it's not easy to remove the usage in com.facebook.presto.type.TestDateTimeOperatorsLegacy#testDaylightTimeSavingSwitchCrossingIsApplied (except by inlining the logic that constructs a millis value).

Please update setStartTime(new DateTime(2017, 3, 1, 10, 0, 0, 0, DateTimeZone.UTC).getMillis()) to use Kathmandu time zone.

This only for construcing session start time (long millis), the time zone used to construct the millis can be anything. By using UTC we have more visible correlation between the date-time component used to construct those millis and the expected value in the test.

Please remove unnecessary throws Exception in TestDateTimeFunctions and TestDateTimeFunctionsLegacy.

Done

Please add setTimeZoneKey(TIME_ZONE_KEY) to TestDateTimeFunctions, just like you did to TestDateTimeFunctionsLegacy.

I instead made TestDateTimeFunctionsLegacy to build session starting with Session.builder(session) which sets both:

  • setTimeZoneKey(TIME_ZONE_KEY)
  • legacy_timestamp

Now the two test classes are consistent.

The montrealTimeZoneKey change doesn't seem to belong to this commit. In addition, I imagine there is a reason for the change. Can you put comment on it?

The commit affects tests for current date, local time(stamp), current time(stamp). (The current date part is signalled by "+ Improve test coverage" in the commit message).

If you want this fix in, I would be fine that this going in without respecting the flag. I don't think the flag makes sense here.

Please elaborate? Is it about current_time and political time zones?
I really do know it would be better to remove political time zones first, but I don't want yet another lengthy rebase & refactor because of that. This is very sensitive area to work with.

If you don't care much (since TIME WITH TIME ZONE is going away soon), you can remove this change.

It's not going away, is it?
You probably meant political time zone in TIME WITH TIME ZONE -- then, as I explained above, my preference would be to merge this as-is and remove political zones in a follow up. I admit this isn't the most clean approach, but this way it would be much more effective from engineering perspective.

This is not an acceptable approach: formatString.toStringUtf8().toLowerCase().contains("z"). Given the complexities, I'm ok with merging it first and then fix it later. Add a TODO ...

Given DateTimeFormat is quite complex, I wouldn't want to copy it over to our code base. At the same time, its pattern is well documented, so we know how to interpret the patterns. I've extracted the check to a method and also make it correct with respect to quotes (', aka "escape for text")

I prefer that testTimeZoneGapIsNotApplied and testTimeZoneGapIsNotApplied be renamed to testTimeZoneGap. This helps in that the corresponding tests can be easily discovered by some one reading the code. Having a comment at the beginning of the method that explains the intended direction of the test for both will be helpful.

Fixed.

Ditto for testDaylightTimeSavingSwitchCrossing.

Fixed

Remove @deprecated

Done

Rename both overload of modulo24Hour because they're doing completely different things and shouldn't share a name.

They are very related -- both serve as a normalization of TIME / TIME WITH TIME ZONE value into the correct range. See how parallel is their use in com.facebook.presto.type.TimestampOperators#castToTime.

REFERENCE_TIMESTAMP_UTC in TimeOperators is unused. Remove.

Fixed

For castToTimeWithTimeZone, use convertLocalToUTC instead of doing the math directly. It's incorrect whenever you pass local epoch to getOffset. (I understand this means we are basically looking at the zone as if it's 1970-01-01. But I assume political zone for TIME WITH TIME ZONE is going away soon.)

To be sure, which castToTimeWithTimeZone you mean? TimeOperators#castToTimeWithTimeZone, right?

Done.

For TimeWithTimeZoneOperators.castToTimestamp, let's not worry about REFERENCE_TIMESTAMP_UTC and use convertUtcToLocal? (I understand this means we are basically looking at the zone as if it's 1970-01-01. But I assume political zone for TIME WITH TIME ZONE is going away soon.)

Let's fix it when we remove political zones form TIME WITH TIME ZONE. I added a clarification comment in the code.

For TimestampOperators.castToTimeWithTimeZone, why not use convertLocalToUTC instead of doing the math directly. It's incorrect whenever you pass local epoch to getOffset. (Side note: When we make TIME WITH TIME ZONE support offset zone only, this will be a tricky one. It will need to make sure it chooses the correct offset zone depending on the local epoch.)

Fixed

For TimestampOperators.castToTimestampWithTimeZone, use convertLocalToUTC instead of doing the math directly. It's incorrect whenever you pass local epoch to getOffset.

Fixed

Typo IllegalArgumentException ei

Fixed

Change // Despite the name this accepts literals with and without time zone to // This accepts value with or without time zone and move it to the beginning of the method.

Done

For TimestampWithTimeZoneOperators.castToTimestamp, use convertUTCToLocal.

Fixed

TestDateTimeFunctionsBase.testDateDiffTimestamp is a big hack. To constrain the scope of the hack, revert changes to any of the fields. You can add variables to testDateDiffTimestamp wherever necessary. Add a TODO to fix it ASAP. I'll deal with the rest after this PR is merged.

I am not sure I follow.

Inline your newly introduced assertFunctionString overload in TestTimeBase.

Done

Remove testCastToTimeWithTimeZoneWithTZWithRulesChanged since we don't care about political zones in TIME WITH TIME ZONE, and it's hard to think about.

I didn't. Let's remove political time zones in a follow up.

Remove testCastToTimestampWithTimeZoneDSTIsNotAppliedWhenTimeCrossesDST for the same reason. (By the way, the method name should say Time instead of Timestamp.)

Ditto. (Fixed)

You moved a chunk of TestTimestampBase.testCastFromSlice to TestTimestamp and TestTimestampLegacy. Move it completely, or move none of it. Don't move partially. Don't call super.

Why? This was done so to make highlight the fact there are some shared test cases, but there are also some special test cases. This helps in that the corresponding tests can be easily discovered by some one reading the code.

Do not use sqlTimestampOf (only do so in TestXBase).

Yes, this is not needed since we know that TestTimestampLegacy uses legacy timestamp semantics. OTOH, this makes the test code more coherent and possible to compare across the different semantics.

Your changes to TestTimestampWithTimeZone. Add comments. Use a different method name.

As above.

Change to timestampAtTimeZone and getTimeZoneKeyForOffset is good.

Thanks

I didn't take care in timeAtTimeZone. After all, it doesn't matter given political zone in TIME WITH TIME ZONE is going away soon.

Good point.

Call both test methods toIso8601ForTimestampWithoutTimeZone. You can add comments to the beginning of the methods.

I called them both testToIso8601.

Move testTimeRepresentationOnDate and valueFromLiteral to the end in TestDateTimeOperators.java.

Right.

When arguments of a function is split on multiple lines, we generally put the first argument on its own line.

Fixed

Beware if you're going to merge #10128 first.

I'm not.

@findepi findepi force-pushed the epic/timestampp/pr branch from 0ee7a18 to 317e8b6 Compare June 7, 2018 14:23
@haozhun
Copy link
Contributor

haozhun commented Jun 12, 2018

For your commits I didn't mention, they are good. For your comments I didn't mention, I agree with them. Sorry but otherwise there would be too many items here.

I consulted the code only to understand your response and my own comments. I didn't read any commits carefully. I depended on your response in the previous comment.

"Fix current_time timezone offset" - looks good with comments

Change commit message to include localtime

I added a squahs commit, so that I don't mess with the commit history yet.

Be aware that the empty commit might disappear when you do a rebase. (That's what happend to me.)

"Introduce new TIME/TIMESTAMP semantics to SQL types" - comments below

Do not deprecate getSessionTimeZoneKey and isLegacyTimestamp. I don't see a particular benefit in deprecating them. I would like to reserve deprecation for methods that should not be used. We will have some.

In general, I agree we should deprecate methods, switches, flags just because a related functionality is going to be removed.
However, I think SqlTimestamp case is very special. Due to technical limitations (type system, client infrastructure code, etc.) we use single class to represent two quite different things (legacy TIMESTAMP and the new TIMESTAMP). Here, @deprecated is really helpful to understand when a particular field is going to be set.

In a TODO item that follows my original comment, I was arguing for adding the word legacy into the name of a factory method (and remove public constructor) because I don't like depending on strikethrough lines for code readability. I believe your argument aligns with my intention. You argue that @Deprecated is necessary for code readability at the moment. And I think you're right. I think the @Deprecated should be kept here until we get the TODO item done.

Rename millisUtc field to millis, rename millisUtc parameter in newly added constructors to millisLocal. Do this for both SqlTime/SqlTimestamp.

My initial reaction to using millisUtc was same as yours, millisUtc are a bit misleading. However, millisLocal are IMO misleading as well.

The stack representation of a timestamp can be interpreted as referring to UTC: "an instant of time (millis) which in UTC time zone corresponds to the timestamp value being represented". Also, the code actually leverages this fact, eg. com.facebook.presto.spi.type.SqlTimestamp#toString invokes UTC time zone.

I requested two things in this comment:

  1. Change name of the field. I agree that neither millisUtc nor millisLocal is correct for the field. That's exactly why I argued for calling the field just millis instead.
  2. Change name of the constructor argument of the new constructor only to millisLocal. I think you're right that calling it millisLocal is misleading. After all, in the new semantics, there is no such thing as millisLocal or millisUtc.

Therefore, I change my request to calling both to millis (and leave the constructor argument of the legacy constructor as millisUtc).

In addition, following this request, my other two comments follows:

  • Make getMillisUtc throw if the object was constructed with new semantics.
  • Add getMillis.

"Add new TIME/TIMESTAMP parsing/printing to DateTimeUtils" - comments below

You need a LEGACY_TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER. And you should use that one for the legacy parseTimestampWithoutTimeZone.

Please help me understand. It seemed as unnecessary, as the TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER was not changed.

I definitely should have been more clear. It took me myself a while to figure out.

After addressing the next comment (quoted below), you will need then need it. The TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER will be using withZoneUTC. And the LEGACY_... will be using withOffsetParsed.

The new parseTimestampWithoutTimeZone should use TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER instead of TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER. Details in this comment (#9385 (comment))

That comment was addressed more explicitly, in com.facebook.presto.type.TimestampOperators#castFromSlice.
Since parseTimestampWithoutTimeZone is a shared method and the comment pertained the CAST from varchar, I though it would be better to change the CAST code.

I disagree. Please change parseTimestampWithoutTimeZone. Here is my reasoning.

Let's first figure out what are the methods we have in DateTimeUtils. It's confusing, not well named, not well documented. I have TODO items to rename them and add comments.

  • There are 3 methods for timestamp and 3 methods for time (plus another copy now that we have old/new semantics). The 3 methods are parseXLiteral, parseXWithoutTimeZone, and parseXWithTimeZone. The confusing part is that they are not consistently named as such. Sometimes a method is just called parseX, and a reader will have a hard time figuring out whether it means parseXLiteral or parseXWithoutTimeZone.
  • In addition, it is understandable that one might be confused about the semantics of parseXWithoutTimeZone and parseXWithTimeZone are. Which one does it mean? 1) parse string into X with/without zone; 2) parse string with/without zone into X. It has to be 1 because 2 didn't even specify what the return representation is. And reading all the implementations, it indeed means 1.

When I reviewed last time, I spent a long time to figure these out. Now, once that's figured out, the supposed behavior of the new parseTimestampWithoutTimezone method becomes obvious. It needs to handle input string with or without time zone.

In addition, it's ok (although not ideal) for parseXLiteral to depend on exception for control flow. But for the other two, this is definitely not. Those can be in tight loops during query execution.

"Use legacy-flag aware wrappers for SqlTime/SqlTimestamp cration in tests" - comments

I don't understand why you need timestampZone. It's available in the session.

It should be the same, but existing code didn't have provisions to ensure that.

I don't understand why you need baseZone. It can be derived from timestampZone.

They represent two different things. See com.facebook.presto.type.TestTimestampWithTimeZoneBase#testCastToTimestamp

Please don't have overloads that take millis. Remove unnecessary usage of from_unixtime in tests. (optionally in a separate commit)

I removed most of it.

Please don't have overloads that take DateTime. java.time.LocalDateTime will be a better choice.

The only reason for having this is to facilitate the transition.
I think removing them needs some code duplication, which generally is OK, but would prefer not to add this to this PR.
Also, it's not easy to remove the usage in com.facebook.presto.type.TestDateTimeOperatorsLegacy#testDaylightTimeSavingSwitchCrossingIsApplied (except by inlining the logic that constructs a millis value).

I make these all TODO items. I also added a TODO item to add comments to put some of your answers here into comments.

"Fix current_time, localtime & localtimestamp semantics"

Please update setStartTime(new DateTime(2017, 3, 1, 10, 0, 0, 0, DateTimeZone.UTC).getMillis()) to use Kathmandu time zone.

This only for construcing session start time (long millis), the time zone used to construct the millis can be anything. By using UTC we have more visible correlation between the date-time component used to construct those millis and the expected value in the test.

The reason it would make more sense to use Kathmandu is because tests a few lines below the setStartTime call are asserted with strings. And all those strings are written in Kathmandu time. None of the tests with that lines has a long millis in the asserts.

I don't understand your statement "By using UTC we have more visible correlation between the date-time component used to construct those millis and the expected value in the test.". Being able to have correlation is the exact reason I want Kathmandu. Using UTC provides no visible correlation between the setStartTime and the expected value in the tests.

If you want this fix in, I would be fine that this going in without respecting the flag. I don't think the flag makes sense here.

Please elaborate? Is it about current_time and political time zones?
I really do know it would be better to remove political time zones first, but I don't want yet another lengthy rebase & refactor because of that. This is very sensitive area to work with.

If you don't care much (since TIME WITH TIME ZONE is going away soon), you can remove this change.

It's not going away, is it?
You probably meant political time zone in TIME WITH TIME ZONE -- then, as I explained above, my preference would be to merge this as-is and remove political zones in a follow up. I admit this isn't the most clean approach, but this way it would be much more effective from engineering perspective.

Sorry, the original comment was indented wrong. They are meant to be subpoints of "It's really hard to reason about TIME WITH TIME ZONE with political zone in the picture. It's up to you to pick one of the two alternatives below."

I was arguing against having anything related to TIME WITH TIMEZONE depend on isLegacyTimestamp. I was suggesting to either 1) not changing current_time, or 2) change it to the new behavior, without checking isLegacyTimestamp.

But I don't feel strongly. After all, support for political zone in TIME WITH TIME ZONE will be removed before the isLegacyTimestamp is considered ready for use.

"Introduce new TIME/TIMESTAMP semantics to scalar functions"

This is not an acceptable approach: formatString.toStringUtf8().toLowerCase().contains("z"). Given the complexities, I'm ok with merging it first and then fix it later. Add a TODO ...

Given DateTimeFormat is quite complex, I wouldn't want to copy it over to our code base. At the same time, its pattern is well documented, so we know how to interpret the patterns. I've extracted the check to a method and also make it correct with respect to quotes (', aka "escape for text")

Your datetimeFormatSpecifiesZone is awesome.

"Add new date time semantics to date time cast operators"

Rename both overload of modulo24Hour because they're doing completely different things and shouldn't share a name.

They are very related -- both serve as a normalization of TIME / TIME WITH TIME ZONE value into the correct range. See how parallel is their use in com.facebook.presto.type.TimestampOperators#castToTime.

From the perspective of their purpose, yes, they are the similar. One normalizes TIME, the other normalizes TIME WITH TIME ZONE.

But from the perspective of what they do, no, I don't think they are the same. Only one of them is doing modulo24Hours. The other one is not. An accurate technical description would be withDate(EPOCH_DATE).

Given their current name is modulo24Hours, that's where my sentiment that "they are doing very different things" come from.

If their names were normalizeTime, it would be a lot more reasonable. However, I would still prefer
normalizeTimeWithTimeZone and normalizeTimeWithoutTimeZone instead.

I have put the rename in the TODO list below. I think it's reasonable to keep it as is for now to avoid messing up the commits.

TimestampOperators.castFromSlice should accept values with time zone (regardless of flag).
Remove most of the catch block in else.

(This comment was not addressed or replied to)

In an earlier comment, I explained why I believe that the logic for allowing both string with/timeout time zone should be in DateTimeUtils. Once parseTimestampWithoutTimeZone is implemented as such, the else branch would no longer be necessary.

TestDateTimeFunctionsBase.testDateDiffTimestamp is a big hack. To constrain the scope of the hack, revert changes to any of the fields. You can add variables to testDateDiffTimestamp wherever necessary. Add a TODO to fix it ASAP. I'll deal with the rest after this PR is merged.

I am not sure I follow.

I made this a TODO item. Don't worry about it for now.

I think It's a bad idea to have the TIMESTAMP change based on which class it is.

For the sake of sanity, I propse some rules for these tests. Rule 1:

  • For scenarios where behavior is the same between legacy and new, tests should live in TestXBase. For scenarios where behavior is different between legacy and new, it should live in TestX and TestXLegacy.

Applying this rule, the TIMESTAMP constant field should have type LocalDateTime. This would be sufficient for cases where the behavior didn't change. (That's most cases.) For the cases where the behavior does change, since those tests would live in two different classes, they can have different constant fields however they like.

You moved a chunk of TestTimestampBase.testCastFromSlice to TestTimestamp and TestTimestampLegacy. Move it completely, or move none of it. Don't move partially. Don't call super.

Why? This was done so to make highlight the fact there are some shared test cases, but there are also some special test cases. This helps in that the corresponding tests can be easily discovered by some one reading the code.

Here, I think it would make sense to rename testCastFromSlice in TestTimestampBase to testCastFromSliceWithoutTimeZone, and rename testCastFromSlice in TestTimestamp and TestTimestampLegacy to testCastFromSliceWithTimeZone.

For the sake of sanity, I propose rule 2 (which is really just an emphasis of rule 1:

  • A test should not live in TestXBase and TestX/TestXLegacy at the same time.

From practical perspective, having some methods delegate to super (and some that doesn't) makes the tests error prone. It will also likely lead to more inconsistency as people can't tell based on existing code how they should add new tests.

From the logical perspective, it will also make tests easier to find when there isn't hidden structures.

I further propose rule 3 (Don't worry about it now. I have put it in TODO.)

  • Any test in TestX and TestXLegacy should show up in pairs. To make it easier to find the counterpart and to avoid orphans, the test should be declared as abstract in TestXBase.

Your changes to TestTimestampWithTimeZone. Add comments. Use a different method name.

As above.

Please change TestTimestampWithTimeZone to avoid using super. Put testCastToTime and testCastToTimeWithTimeZone in both TestTimestampWithTimeZone and TestTimestampWithTimeZoneLegacy, and remove them from TestTimestampWithTimeZoneBase.

In addition, the additional test cases you added for TestTimestampWithTimeZone should also show up in TestTimestampWithTimeZoneLegacy, even though the expected value will likely be different.

"Introduce new TIMESTAMP semantics to to_iso8601 scalar" - looks good with comments"

Call both test methods toIso8601ForTimestampWithoutTimeZone. You can add comments to the beginning of the methods.

I called them both testToIso8601.

This is problematic. You have testToISO8601 in the base test class, and testToIso8601 (notice case of SO is different) in the derivative test classes.

Please rename to testToIso8601ForTimestampWithoutTimeZone.

Outstanding TODOs

https://gist.github.com/haozhun/ec833dd76b62a7df020a3b77ddf56a65

@haozhun haozhun assigned findepi and unassigned haozhun Jun 12, 2018
@findepi
Copy link
Contributor Author

findepi commented Jun 21, 2018

"Fix current_time timezone offset"

Be aware that the empty commit might disappear when you do a rebase. (That's what happend to me.)

Yeah..
I just squashed all the fixups that you have already seen. The empty commit was just a "rename commit message" marker and was applied.

In short, the following commits are NEW

  • 77b7338dcd - fixup! Fix current_time, localtime & localtimestamp semantics
  • afd7d8c29c - Rename SqlTime#millisUtc to millis
  • 2ee4ccbd9d - Rename SqlTimestamp#millisUtc to millis
  • c8604eebb0 - Rename DateTimeUtils#parseTime to parseTimeLiteral
  • 333eaec98a - fixup! Use non-JVM timezone in TestDateTimeFunctionsBase
  • c3437c90a8 - fixup! Introduce new TIMESTAMP semantics to to_iso8601 scalar

"Introduce new TIME/TIMESTAMP semantics to SQL types"

Therefore, I change my request to calling both to millis (and leave the constructor argument of the legacy constructor as millisUtc). [...]

I added this as an additional commit rather than reworking the existing ones.

"Add new TIME/TIMESTAMP parsing/printing to DateTimeUtils"

After addressing the next comment (quoted below), you will need then need it. The TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER will be using withZoneUTC. And the LEGACY_... will be using withOffsetParsed.
...
Now, once that's figured out, the supposed behavior of the new parseTimestampWithoutTimezone method becomes obvious. It needs to handle input string with or without time zone.

I agree that methods are not consistently named. I renamed parseTime to parseTimeLiteral.

However, I am not convinced about the desired semantics of parseTimestampWithoutTimezone.
For the sake of reasoning about consistency, I ignore legacy methods, so only the two remain: parseTimestampWithoutTimezone and parseTimeWithoutTimezone. Both of these methods can parse text containing a value (without zone). Both reject text containing a value with zone.

Also, that is also a behavior that I want these methods to have. parseXWithoutTimezone should be consequently "without time zone", should not do some magic time zone stripping under the hood.

We need elasticity in two places

  • parsing & assigning type to TIME/TIMESTAMP literals
  • the CAST to timestamp

Now, assining type uses XhasTimeZone.
Parsing literals is handled by parseXLiteral (for both TIME and TIMESTAMP, the parseXLiteral does: try parseWithZone else parseWithoutzone).
Making parseTimestampWithoutTimezone accept text values wiht zone would make timestampHasTimeZone and parseTimestampLiteral literal.

Thus, elasticity in parseTimestampWithoutTimezone is not required by consistency nor by reusability (quite contrarily). To me, the only logical place to apply elasticity is "somewhere else". This can be a new method (parseTimestampFromTimestampWithOrWithoutTimezone). For now, I chose not to define this method and implement this logic directly in the said CAST.

"Fix current_time, localtime & localtimestamp semantics"

The reason it would make more sense to use Kathmandu is because tests a few lines below the setStartTime call are asserted with strings. And all those strings are written in Kathmandu time. None of the tests with that lines has a long millis in the asserts.

i see this now.
Because the adjacent lines are later changed, I added fixup commits "inline".

I was arguing against having anything related to TIME WITH TIMEZONE depend on isLegacyTimestamp. I was suggesting to either 1) not changing current_time, or 2) change it to the new behavior, without checking isLegacyTimestamp.
But I don't feel strongly. After all, support for political zone in TIME WITH TIME ZONE will be removed before the isLegacyTimestamp is considered ready for use.

Yes, that's me thinking. What we have now is, by necessity, an approximation of the final solution. Let's revisit TIME WITH TIME ZONE as a follow-up.

"Introduce new TIME/TIMESTAMP semantics to scalar functions"

Your datetimeFormatSpecifiesZone is awesome.

Thanks! I appreciate your feedback :)

"Add new date time semantics to date time cast operators"

Rule 1: For scenarios where behavior is the same between legacy and new, tests should live in TestXBase. For scenarios where behavior is different between legacy and new, it should live in TestX and TestXLegacy.
Applying this rule, the TIMESTAMP constant field should have type LocalDateTime. This would be sufficient for cases where the behavior didn't change. (That's most cases.) For the cases where the behavior does change, since those tests would live in two different classes, they can have different constant fields however they like.

I like this approach.
I didn't apply this, since you put it on the TODO list.

Here, I think it would make sense to rename testCastFromSlice in TestTimestampBase to testCastFromSliceWithoutTimeZone, and rename testCastFromSlice in TestTimestamp and TestTimestampLegacy to testCastFromSliceWithTimeZone.

note: when doing this renames, we should also change *FromSlice to *FromVarchar (in tests & in impl code)

From practical perspective, having some methods delegate to super (and some that doesn't) makes the tests error prone.

I think I like tests delegating to super because it clearly says "hey, i support more cases". But yes, if this is not applied consistently, it might be overall not helpful.

Please change TestTimestampWithTimeZone to avoid using super. Put testCastToTime and testCastToTimeWithTimeZone in both TestTimestampWithTimeZone and TestTimestampWithTimeZoneLegacy, and remove them from TestTimestampWithTimeZoneBase.
In addition, the additional test cases you added for TestTimestampWithTimeZone should also show up in TestTimestampWithTimeZoneLegacy, even though the expected value will likely be different.

Would it be OK to enlist this on a TODO list as well?
If I were to do this now, I would prefer to do this as a new commit on top anyway.

@findepi findepi force-pushed the epic/timestampp/pr branch from 317e8b6 to c3437c9 Compare June 21, 2018 16:53
@findepi findepi assigned haozhun and unassigned findepi Jun 21, 2018
@haozhun
Copy link
Contributor

haozhun commented Jun 22, 2018

I feel pretty strongly on the topic of parseTimestampWithoutTimezone.

I reiterated the "consistency" point. But my strong feeling is mostly about "utility".

I am not convinced about the desired semantics of parseTimestampWithoutTimezone. For the sake of reasoning about consistency, I ignore legacy methods, so only the two remain: parseTimestampWithoutTimezone and parseTimeWithoutTimezone. Both of these methods can parse text containing a value (without zone). Both reject text containing a value with zone.

There's also parseTimeWithTimeZone and parseTimestampWithTimeZone. For consistency, if parseTimestampWithoutTimeZone rejects inputs with time zone, those two should reject inputs without time zone.

Also, that is a behavior that I want these methods to have. parseXWithoutTimezone should be consequently "without time zone", should not do some magic time zone stripping under the hood.

Earlier, I conceded that parseXWithoutTimezone is ambiguous. However, if one is to go for the literal meaning, "parse XWithoutTimezone" would be interpreted as "parse something into XWithoutTimezone". (This does not contadict my position that it's ambiguous. If I see "parse XWithoutTimezone" without comments, I would still consider it ambiguous because people don't necessarily express what they mean accurately. The other interpretation has a non negligible likeliness here.)

We need elasticity in two places

  • assigning type to TIME/TIMESTAMP literals & parsing such literals (rephrase mine)
  • the CAST to timestamp

Now, assigning type uses XhasTimeZone.

This talks about the first half of bullet point 1.

Parsing literals is handled by parseXLiteral (for both TIME and TIMESTAMP, the parseXLiteral does: try parseWithZone else parseWithoutzone).

This talks about the second half of bullet point 1. The legacy parseTimestampLiteral didn't delegate to parseTimestampWithoutTimeZone. And I don't think why the new one need to. (Now, I acknowledge that it will be nice if it can. I just don't find it important. Delegation doesn't improve readability or succinctness here.) By the way, using delegation here would be correct, it would just be too subtle for readers.

Making parseTimestampWithoutTimezone accept text values with zone would make timestampHasTimeZone and parseTimestampLiteral literal.

If I'm reading this correctly, this is repeating "that is a behavior that I want these methods to have".

It looks like you forgot bullet point 2.

There is one more reason here. I didn't mention it because I thought my other argument were sufficient. With your changes, TimestampOperators.castFromSlice will be depending on exception for control flow. This will negatively affect performance, presumably quite significantly. (Note parseXLiteral is different because it only runs during planning.)

Thus, elasticity in parseTimestampWithoutTimezone is not required by consistency nor by reusability (quite contrarily).

To me, the only logical place to apply elasticity is "somewhere else". This can be a new method (parseTimestampFromTimestampWithOrWithoutTimezone). For now, I chose not to define this method and implement this logic directly in the said CAST.

An accurate name would be parseTimestampWithoutTimeZoneFromStringWithOrWithoutTimeZone. Note the name starts with parseTimestampWithoutTimeZone.

There are 3 callsites of parseTimestampLiteral (or parseTimeLiteral) today.

  • DateTimeUtils.parseXLiteral: Either semantics work. The semantics in your proposal is more logically correct. But both works. I think it should inline, given it's internal in DateTimeUtils.
  • XOperators.castFromSlice: This wants "parse X wo tz from string w or wo tz".
  • SqlToRowExpressionTranslator.Visitor.visitXLiteral: Either semantics work. The semantics in your proposal is more logically correct. But both works.

Therefore, from a utility perspective, it's unnecessary to go down the path having separate methods for parseTimestampWithoutTimeZoneFromStringWithOrWithoutTimeZone and parseTimestampWithoutTimeZoneFromStringWithoutTimeZone.

By the way, I would like to see WithTimeZone and WithoutTimeZone shortened to WithTZ and WithoutTZ respectively throughout the code base. Names are too long right now. (I'm not trying to pursue this here. But I would like to know how you feel.)

@findepi findepi force-pushed the epic/timestampp/pr branch 2 times, most recently from c822149 to b928ea3 Compare June 23, 2018 21:08
@findepi
Copy link
Contributor Author

findepi commented Jun 23, 2018

@haozhun i applied changes as discussed + I added some (hopefully) clarification comments for parse methods. Please have a look.

*/
public static long parseTimestampWithoutTimeZone(String value)
{
return TIMESTAMP_WITHOUT_TIME_ZONE_FORMATTER.parseMillis(value);
LocalDateTime localDateTime = TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER.parseLocalDateTime(value);
return localDateTime.toDateTime(UTC).getMillis();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All we need is org.joda.time.LocalDateTime#getLocalMillis, but it's not accessible. @haozhun any better idea than what i have here?

fiedukow and others added 16 commits June 26, 2018 10:37
AT TIME ZONE does not take into account the fact, that TIME WITH TIME ZONE
does not represent real millisecond UTC in millisUtc field.
In fact, this field contains millisUtc assuming offset of TIME ZONE that was
valid on 1970-01-01. Such representation allows to simply represent local time
with time zone id. That means that TIME WITH TIME ZONE that represents eg.
'10:00:00.000 Asia/Kathmandu' will always represent this exact value.
However mapping of such value to other TZ (including UTC) may differ over time.
Eg. Asia/Kathmandu switched time zone offset on 1986 from +5:30 to +5:45.
Result of query like:
`SELECT time_with_tz_column FROM table;`
Will always be the same, however:
`SELECT time_with_tz_column AT TIME ZONE 'UTC' FROM table;`
Will yail differnet value in 1980 and 2000 after changes from this commit.

This is done to use current offset of TZ as function that stucked in 1970-01-01
offsets seems useless.
This is not perfect solution and is not fully aligned with standard, but standard
behavior cannot be achieved with current TIME WITH TIME ZONE representation, as
we are not able to read TZ offset from TIME WITH TIME ZONE itselve (at least not
in all cases).
Using this will allow to write simpler tests and avoid intermittent
error caused by DST or changes in time zone policies.

Without this change some test would require to rewrite to much logic
from implementation making them less usefull.
For consistency with `parseTimestampLiteral`.
{
LocalDateTime localDateTime = TIMESTAMP_WITH_OR_WITHOUT_TIME_ZONE_FORMATTER.parseLocalDateTime(value);
try {
return (Long) getLocalMillis.invoke(localDateTime);
Copy link
Contributor

@haozhun haozhun Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invokeExact is @PolymorphicSignature, thus can return primitives directly.

If you change (Long) to (long), you will be able to avoid boxing, and you will be able to use the more efficient invokeExact.

            return (long) getLocalMillis.invokeExact(localDateTime);

@findepi findepi force-pushed the epic/timestampp/pr branch from 2e6191c to eb3d770 Compare June 26, 2018 20:01
@findepi
Copy link
Contributor Author

findepi commented Jun 26, 2018

AC

@findepi findepi merged commit bd41fe1 into prestodb:master Jun 27, 2018
@findepi findepi deleted the epic/timestampp/pr branch June 27, 2018 19:33
@findepi
Copy link
Contributor Author

findepi commented Jun 27, 2018

@fiedukow good job!

@losipiuk thank you for all the discussions around this!

@haozhun thank you for your thorough review. I could press the merge button with confidence :)

Merged! 🎉 Time for a 🍰

@kokosing
Copy link
Contributor

Congratulations!

@fiedukow
Copy link
Member

GJ & GZ all I seriously thought that will never happen :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants